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00 ! ABSTRACT 

The dominant constituents of the Universe's matter are beheved to be colhsionless in 
nature and thus their modeUing in any self-consistent simulation is extremely impor- 
tant. For simulations that deal only with dark matter or stellar systems, the conven- 
tional N-body technique is fast, memory efficient, and relatively simple to implement. 
However when extending simulations to include the effects of gas physics, mesh codes 
<~i . are at a distinct disadvantage compared to SPH codes. Whereas implementing the 

N-body approach into SPH codes is fairly trivial, the particle-mesh technique used in 
mesh codes to couple collisionless stars and dark matter to the gas on the mesh, has 
• a series of significant scientific and technical limitations. These include spurious en- 

I tropy generation resulting from discreteness effects, poor load balancing and increased 

^ ■ communication overhead which spoil the excellent scaling in massively parallel grid 

codes. 

In this paper we propose the use of the collisionless Boltzmann moment equations 
as a means to model the collisionless material as a fluid on the mesh, implementing it 
\^ ' into the massively parallel FLASH AMR code. This approach which we term "collision- 

\ less stellar hydrodynamics" enables us to do away with the particle-mesh approach 

■ and since the parallelisation scheme is identical to that used for the hydrodynamics, it 
l/^ ■ preserves the excellent scaling of the FLASH code already demonstrated on peta-flop 

machines. 

We find that the classic hydrodynamic equations and the Boltzmann moment 
\ equations can be reconciled under specific conditions, allowing us to generate analytic 

solutions for collisionless systems using conventional test problems. We confirm the 
validity of our approach using a suite of demanding test problems, including the use of 
a modified Sod shock test. By deriving the relevant eigenvalues and eigenvectors of the 

■ Boltzmann moment equations, we are able to use high order accurate characteristic 
' tracing methods with Riemann solvers to generate numerical solutions which show 
, excellent agreement with our analytic solutions. We conclude by demonstrating the 

ability of our code to model complex phenomena by simulating the evolution of a 
two armed spiral galaxy whose properties agree with those predicted by the swing 
amplification theory. 
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1 INTRODUCTION 

Galaxies are complex systems consisting of a multitude of 
components that often require different approaches for both 
analytical and numerical study. The gaseous component 
can be described with the classic hydrodynamics equations 
(CHE) thanks to frequent collisions between constituent 
particles that act to isotropise the local pressure. On the 
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other hand, stellar and dark matter particles lack collisions 
which effectively means the local pressure is anisotropic and 
should be described by a stress tensor. Although this modi- 
fication is rather straightforward to implement in the CHE 
(see e.g. lSamland et al.lll997l : IVorobvov fc Theisll2006l ). this 
approach has mostly been applied to analytical and semi- 
anal ytical studies of growing i nstabilities in stellar systems 
(e.g. iBinnev fc TremaindlToSTh . 

When it comes to numerical simulations of collisionless 
galactic components, N-body methods have often been the 
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method of choice fe.g. lAthanassoulalll984l '). One owes this to 
the relative ease with which they can be implemented and 
one can start modelling three-dimensional coUisionless sys- 
tems - often with rather small numbers of particles. In par- 
ticular, large dark-matte r-only N-body simulations (e.g. the 
Millennium simulation, Springcl 2005) have led to signif- 
icant advances in our understanding of cosmological struc- 
ture formation, allowing some of the founding tenets of the 
current paradigm to be tested. 

As simulations have grown in complexity, incorporating 
the effects of baryonic matter, it has become necessary for 
theorists to develop a wider range of techniques to model 
the gaseous component. For this, there exist two dominant 
approaches; that of Smooth Particle Hydrodynamics (SPH) 
and alternatively, grid codes. Whilst both approaches agree 
for simple tests where analytic solutions are available, dis- 
crepancies are still often found when modelling more com- 
plex phenomena. SPH is very robust and memory efficient, 
allowing large cosmological runs to self-consistently model 
the effects of gas physics and coUisionless m atter (see for 
example the OWLS an d GIMIC simulations; ISchave et al.l 
I2OO9I : ICrain et al.ll2009l ). However, due to the higher shock 
capturing powers and more flexible refinement criteria of 
mesh codes, along with the tendency of SPH to suppress 
the for mation of hydrodynamic instabilities an d turbulent 
mixing l|Agertz et al.ll2007l : iMitcheU et all2009l ). grid codes 
present a powerful alternative. In particular, grid based 
schemes are better suited to modelling small-scale physical 
processes such as heat conduction and radiative transfer. 

Unfortunately, when coupling the coUisionless matter 
to the coUisionally dominated baryonic matter, it begins to 
become clear that the N-body approach is not always ide- 
ally suited to grid codes. In SPH the gas is discretised into 
particles of a given mass, allowing the effects of gravity to 
be calculated using an N-body method in a very convenient 
self-consistent manner for all components. Grid codes how- 
ever require the mapping of the coUisionless particle masses 
to the mesh, from which the global density field can be used 
to obtain the gravitational potential. This can be done us- 
ing the multigrid technique (|Frvxell et al.|[2000l ) with either 
a nested grid or Adaptive Mesh Refinement (AMR). Alter- 
natively a Fourier Transform or a hybrid of the two can 
accelerate this process. Once the gravitational acceleration 
is computed on the grid, it is then interpolated back to the 
particles. These are updated using a leapfrog method. This 
particle-mesh approach, a mapping of discrete particle prop- 
erties to and from the mesh, represents a series of scientific 
and technical limitations to grid codes which degrade the 
physical reliability of their results and hampers the scalabil- 
ity of grid based simulations. These include: 

• Spurious generation of entropy - The particle-mesh 
technique allows numerical noise to be introduced if insuf- 
ficient particles are present i n a given resolution element or 
"cell." It has been argued bv lSprin geJ (^lO") that in regions 
where the mesh is over-sampled relative to the number den- 
sity of particles, then some cells may receive mass mapped 
from particles whilst their neighbours may not. This can 
lead to strong variations in the local density field when real- 
istically the field should be smooth and continuous. This can 
drive spurious weak shocks and turbulence on small scales, 
generating an unphysical entropy excess. ISpringell (|2010l ') 



cites this as the origin of the higher entropy cores seen in 
galaxy cluster simulations when run using grid codes, com- 
pared to thos e run with SPH codes which have cu spy low 
entropy cores (|Frenk et aj|l999l : iMitcheU et aj|2009l) . 

As actually these particles should represent a continuous 
mass field, this effect can be limited by mapping the dis- 
crete mass of a given particle over a larger region using some 
pre-define d kernel function (for example the "Cloud-in-cell" 
technique: iHarlow. Ellison fc Rei"dlll964l ). this however leads 
to more technical problems. 

• Increased communication overhead - In massively par- 
allel programs, a large amount of effort is often put into 
minimising communication between different nodes over the 
network. Such communication is relatively time consuming, 
with latencies (the time taken to establish and terminate 
a communication) many orders of magnitude greater than 
the clock speed of the processor. As such, many calcula- 
tions could be performed whilst waiting for a given group of 
processes to communicate, and if there is a list of different 
processes to communicate with, the time adds up. There- 
fore any common communication scheme and grouping of 
messages, along with a reduction in the number of differ- 
ent nodes to which we must communicate with, presents a 
significant speed up in the code. 

When particle masses are mapped using a kernel, there 
will inevitably be some overlap with neighbouring cells 
which may or may not reside on different nodes. Although 
communication over the network is natural for any parallel 
simulation, it is lower when dealing with a system which ad- 
vects only fiuxes through cell surfaces instead of a system 
which can map mass to any neighbouring cell, whether it 
is directly adjacent to a face or merely in contact with an 
edge or vertex of the cell. This is demonstrated in figure [T] 
in which a schematic layout of two dimensional (2D) block 
of cells (as used in the FLASH code) is shown along with the 
dashed outline of the neighbouring blocks which surround it. 
The internal cells in the block are shown in red and a sur- 
rounding layer four guardcells deep contains boundary data 
copied from neighbouring blocks. Grey cells are guardcells 
which do not need to be updated or communicated for the 
given scheme, whilst green cells are guardcells which need 
to be updated or exchanged with neighbours. A 2D hydro- 
dynamic simulation (top panel) requires the communication 
of boundary data to compute the four fluxes through its cell 
faces with the grey corner guardcells not being used. The 
bottom panel however, shows how mapping of particle mass 
over a kernel of finite size can lead to particle properties be- 
ing mapped to any of the surrounding guardcells. This data 
then needs to be communicated to the internal cells of the 
blocks to which they correspond. Thus in 2D the communi- 
cation overhead is doubled, whilst in 3D it can increase by a 
factor of 3.3 (6 face fluxes / 20 edge and corner cells to which 
mass may be mapped). Sophisticated space flUing curves can 
help optimise the distribution of blocks (or individual cells) 
across nodes so that as many adjacent neighbours as possible 
reside on the same processor, minimising inter-node commu- 
nication. However increased overhead remains unavoidable, 
especially as we always remain bottle-necked by the slowest 
processor if the simulation is to proceed in lock-step. 

• Poor load balancing - The use of particles in conjunction 
with a mesh also presents a load-balancing issue as many 
forms of astrophysical structure formation result in the nat- 
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Figure 1. Diagrammatic representation of a block of cells and 
the required communication overhead for a flux based scheme 
(top panel) and a particle-mesh scheme (lower panel). The inter- 
nal cells of the block are shown in red. Surrounding guardcells 
which contain boundary data mapped from the internal cells of 
neighbouring blocks (visible as dashed squares) are shown in grey. 
Those guardcells which need to be updated after every step for 
the given scheme are highlighted in green. In the top panel, only 
flux data through the cell faces needs calculating and therefore 
only guardcell data directly adjacent to these surfaces needs to be 
communicated. Thus in 2D, only four of the eight neighbouring 
blocks need to be communicated with. However, the particle-mesh 
algorithm (bottom panel) which maps particle properties to the 
mesh, over a finite region with some Kernel, allows data to be 
mapped into all of the guardcells. Therefore it is necessary to 
perform a full guardcells exchange and to update the correspond- 
ing internal cells of all neighbouring blocks. This results in a much 
greater communication overhead. 



ural concentration of the majority of mass in dense com- 
pact regions. This leads to too many particles accumulating 
in one or two cells whilst the remaining cells have few or 
none, even if the initial particle distribution is fairly uniform. 
Ultimately, the mass field ends up poorly sampled in the 
under-dense regions of the simulation whilst the mapping 
of properties to and from the mesh is performed by a lim- 
ited number of cells within the densest regions. As already 
mentioned, the hydrodynamics requires that as many neigh- 
bouring cells as possible be stored on the same processor. 
Therefore these heavily occupied dense cells are all stored 
on the same few processors leading to extremely poor load 
balancing. Although distributing these dense blocks more 
evenly would improve the load balancing for the particle- 
mesh algorithm, it would ruin the load balancing for the 
hydrodynamics, creating an impossible system to optimise. 

Fortunately the use of the collisionless Boltzmann mo- 
ment equations allows us to overcome all of these limitations 
since it allows the stellar and dark matter components to be 
represented as a collisionless fluid in a near identical manner 
to the standard baryonic gas. This removes the need to de- 
velop a separate parallelisation scheme for the particle-mesh 
approach and allows us to use the same well parallelised hy- 
drodynamics scheme. Instead of communicating guardcell 
data for the gas and stellar material separately, the data 
can now be grouped and communicated all at once, minimis- 
ing the latency overhead and maximising bandwidth usage. 
Ideal load-balancing is also achieved as the time taken to 
calculate the inter-cell fluxes is independent of the amount 
of collisionless mass within a cell. Thus processors will not 
be left idle whilst others calculate, provided that the cells are 
evenly distributed across processors. Given that the hydro- 
dynamics in the FLASH AMR code we use has be en shown 
to scale well for tens of thousands of processors ()Antvpasl 
2006), then by using the same parallelisation scheme for the 
collisionless material, we naturally preserve this scaling. In 
an era of increasingly large and complex simulations, this 
scalability is vital in keeping performance in-line with scien- 
tific requirements. 

Most importantly, as the density of the collisionless 
material is now represented in each cell continuously, this 
removes the potential for spurious generation of entropy 
through discreteness effects introduced when mapping par- 
ticle properties to the mesh. It also removes limitations in- 
volved when converting gas into stars of discrete masses - 
problematic if cells contain insufficient mass but yet have 
densities that satisfy star formation criteria. 

We present in this paper our new implementation of 
the Boltzmann moment equations which allows collisionless 
material to be modelled as a collisionless ffuid on the mesh. 
A major application of this technique will be in modelling 
the behaviour of large collections of stars, such as those in a 
galaxy, as a collisionless fluid in much the same way that we 
model gas hydrodynamics in grid codes. We will therefore 
refer to it from here on as "collisionless stellar hydrody- 
namics." We begin in § [2] with a derivation of the zeroth, 
first and second order moment equations of the collisionless 
Boltzmann equation and then proceed in § |3] to outline the 
FLASH hydrodynamic code (S lS.lfl and the different numer- 
ical schemes we use to numerically integrate the equations. 
These include both a simpler Riemann solver free technique 
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(ij |3.2|) and a more sophisticated Riemann solver scheme with 
characteristic tracing fi; l3.3|l . 

In § U we outline our discovery of how standard hy- 
drodynamic test problems for which analytic solutions are 
known, can be modified under special conditions to repro- 
duce the results of a coUisionless fluid. This enables us to 
use the Sod-shock-tube test problem to test our results, the 
findings of which are given in § 14.11 We continue our tests 
with both a pressure-free spherical collapse test (§ 14. 2p and 
an anisotropic collapse with non-isotropic dispersion tensor 
in § 14.31 As a more complex test of our coUisionless stellar 
hydrodynamics, in 14. 41 we track the growth of a two armed 
spiral in a stellar disc, contrasting the resulting structure 
with the predictions of the swing amplification theory. 

We then conclude our findings with an overview of our 
results, a discussion of the runtime performance of our code 
and the future implications of our work in § [5] 



2 BOLTZMANN MOMENT EQUATIONS 

The dynamics of coUisionless particles are fundamentally 
described by the coUisionless Boltzmann equation for the 
distribution function of particles in the six dimensional 
position- velocity phase space f{t, x, v) 



dt * dxi 



+ ai 



dl 

dvi 



= 



(1) 



where a; = dvi/dt is the acceleration due to gravity and 
curvilinear terms (in non-Cartesian coordinate systems). In 
this paper, we use Cartesian coordinates and hence i = 
(x,y,z). Equation (lU is a six-dimensional, time-dependent 
partial differential equation an d although its direct nu meri- 
cal solution is possible (see, e.g. Yosh ikawa et al.ll2012l ) high 
resolution simulations are prohibitively expensive even for 
the most powerful supercomputers. N-body methods cir- 
cumvent this problem by solving a system of ordinary dif- 
ferential equations for the dynamics of individual particles, 
which on galactic scales often represent a cluster of objects 
rather than individual physical entities. In contrast to this 
discrete mechanics approach, a continuous mechanics ap- 
proach is based on taking the first three velocity moments 
of equation More specifically, equation ([l]) is multiplied 
by ten quantities m,mvi,ViVj , where i = {x,y,x), and in- 
tegrated over the velocity space d^v = dv^dvydv^ to ob- 
tain partial differential equations for the volume density 
p = J mfd^v, the mean velocity u — J mfvdPv, and 
velocity dispersions a^j = / mf {vi — Ui){vj — Uj) dPv of 
coUisionless particles. 

The resulting continuity, momentum, and velocity 
dispersion equations, named Boltzmann moment equations 
(BME), are as follows. 



(2) 



d d 

(pui) + - — {pui ■ Uj + Tlij) - poi = , (3) 



dt 



dt 



{EijUk + po%Ui + poi^uj) 
pUiOj — pUjOi = . 



(4) 



Here, Eij = Ilij -I- pUiUj and Hij — pa^j is the symmetric 



velocity dispersion tensor. All indices obey the usual Ein- 
stein summation rule. Equations ©-Q are closed by us- 
ing the usual zero-heat-Uux approximation and setting the 
third-order moments to zero, Qijk = p~^ S f {""i ~ ~ 
Uj){vk — Uk)d'^v = 0. We discuss the applicability of this 
approximation in Appendix B. 

Equations Q and ^ are in fact the Jeans equations 
(e.g. iBinnev fc Tremaing 19871 ) . from which the classic hy- 
drodynamics equation are derived assuming a diagonal and 
isotropic form of the stress tensor Hij and introducing the 
gas pressure P = 1/3 X] Pf^u Bind internal energy per unit 

i 

volume e = 1/2 pa^j , where i and j axe the number of 

i 

translational and total (translational plus internal) degrees 
of freedom. In the Boltzmann moment equations (BME), no 
simplifying assumptions regarding the form of the stress ten- 
sor are made. Due to this reason, every component of Hij 
needs to be evolved in time separately. Equation ((4]) does 
that for the quantity Eij — Hij + pUiUj, which can be re- 
garded as an analogue to the total energy £ + l/2p|u|^ in the 
CHE. In total, one has 10 equations for ten unknown quan- 
tities: p, pUi, and paij. We note that equations ©-(ll} can 
be convolved to the usual CHE if an isotropic and diagonal 
form of Hij is assumed (see Appendix A) . 

The fundamental similarity of the BME with the CHE 
makes it straightforward to couple coUisionless (stars, dark 
matter) and coUisional (gas, dust) galactic subsystems in one 
grid-based code. Below, we provide a detailed implementa- 
tion of two numerical solvers which can accurately follow the 
behaviour of a coUisionless fluid in the FLASH AMR, code. 



3 NUMERICAL SCHEMES 

A full solution of the coUisionless Boltzmann equation re- 
quires a detailed knowledge of the velocity phase space dis- 
tribution function for a coUisionless fluid. Unfortunately no 
known solution is currently available and therefore the first 
scheme we choose to implement is one which neither re- 
quires any detailed knowledge of the systems equation of 
state, nor any consideration of the behaviour of the dif- 
ferent entropy and pressure waves around cell interfaces 
as would be used by more sophisticated Riemann solver 
schemes. For this purpose we adopt the unsplit solver of 
iKurganov fc Tadmoij (|2000l ) for hyperbolic partial differen- 
tial equations (KT scheme hereafter). The scheme is sec- 
ond order accurate in both space and time using a simplis- 
tic Runge-Kutta midpoint technique to ensure that the so- 
lution is second order accurate in time. This removes the 
need to deconvolve the moment equations into their sep- 
arate eigenvectors within the Riemann interaction fan. De- 
spite being Riemann solver free, leading to an averaging over 
interactions within the Riemann fan, when mod elling a con- 
ventional ideal gas iKurganov fc Tadmoij (|2000D have shown 
that it possess very low numerical diffusion and can accu- 
rately capture sharp shocks and contact discontinuities. This 
makes it an ideal means to test the reliability of higher order 
more sophisticated schemes and as it is also highly extensi- 
ble, higher order moments of the Boltzmann equation can 
be readily incorporated. 

The second scheme we implement involves a more accu- 
rate modified state reconstruction using a new set of eigen- 
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vectors so that Riemann solvers can be directly applied. Un- 
like the KT scheme, the Riemann solver approach achieves 
second order accuracy in time by reconstructing the time 
averaged fluid properties (primitive or characteristic) us- 
ing more detailed knowledge of the eigenvector structure of 
the fluid equations. This grants the shock superior resolving 
powers, allowing sharper changes in the fluid variables to 
be tracked with less numerical diffusion. The scheme is also 
found to be relatively efficient compared to more dissipative 
schemes. As far as the authors are aware, this represents 
the first time that direct solution of the Riemann problem 
for an anisotropic collisionless fiuid has been achieved, with 
previous efforts focussing on either the use of less complex 
schemes such as the aforementioned KT scheme or alter- 
natively more complete direct integration of the collision- 
less Boltzmann equation which represents a more significant 
computational overhead. 



3.1 The FLASH Code 

FLASH was developed at the Alliance Centre for Astro- 
physical Thermonuclear Flashes as part of a DOE grant. 
Originally intended for the study of X-ray bursts and su- 
pernovae, it has since been adapted for many astrophysical 
conditions and now includes modules for relativistic hydro- 
dynamics, thermal conduction, radiative cooling, magneto- 
hydrodynamics, thermonuclear burning, self-gravity and 
particle dynamics via a particle-mesh approach. In this 
study we use FLASH version 3.2. 

The code is massively parallel, sc aling well over more 
than tens of thousands of processors (|Antvpasl 120061 ') ■ and 
is highly modular enabling several different solvers for the 
hydrodynamics and stellar dynamics to be introduced. 

The code features an Oct-tree structure with the simu- 
lation being divided into "blocks" , each of which contains a 
fixed number of cells; Nceiia ~ 16 in each dimension for our 
simulations. In addition to these cells, an additional four 
"guardcells" either side of the internal cells are required, 
which contain a copy of the properties of the neighbouring 
blocks in-case they are stored on a different node. 

The code is capable of adaptive mesh refinement, in- 
creasing the resolution in regions where a user defined re- 
finement criteria is satisfied and conversely removing refine- 
ment in regions where excessive refinement is both unnec- 
essary and wasteful. Refinement is achieved by sub-dividing 
each "parent" block into 2^^°"^ "child" blocks. Those at 
the highest level of refinement for their region of space are 
termed "leaf blocks." The code enforces a maximum jump 
of two between levels of refinement. At refinement level I, a 
fully refined AMR grid will have a total of A'^^fjj"^' cells on 
a side. 

To identify regions of rapid fiow change, FLASH'S refine- 
ment and de-r efinement criteria can incorporate the adapted 
iLohnej (|l987l ) error estimator. This calculates the modified 
second derivative of the desired variable, normalised by the 
average of its gradient over one cell. With this applied to 
the density as is common place, we impose maximum and 
minimum levels of refinement; Imax and Imin- 



3.2 The Kurganov and Tadmor Scheme 

We implement into the FLASH AMR code the semi-implicit 
numerical advection scheme of iKurganov fc Tadmon (2003). 
We briefly outline the scheme below but refer readers to 
their paper for a more in-depth discussion. In conjunction 
with the semi-implicit scheme, we use a Runge-Kutta inte- 
gration scheme for the temporal integration using the fol- 
lowing time derivative for the fiuid variables, u (eqn 4.2 in 
iKurganov fc Tadmoill2000l '). 



d 



Hi+l/2{t) — Hi_i/2{t) 

Ax 



where the numerical fiuxes are defined as. 



Hi+i 



/2 



/ktn/2(*))+/(-r+i/2w) 



2 



(5) 



(6) 



/2 "i+1/2 



Here Mi_i/2 and represent the fiuid variable val- 

ues at the left and right side of cell i with the indices -I- and 
— representing the interpolated values to the left and right 
sides of the respective cell interfaces. The fluxes f{u) are 
those taken from the derived BME (gl O and g} . The quan- 
tity a represents the "spectral frequency" of the system; 



^{p(|{k^i/2; 



9f, + . 



(7) 



As this is the speed with which information is propa- 
gated, it is unsurprising that this equates to the effective 
sound speed of the collisionless fluid. This can be computed 
component wise with no loss of numerical accuracy, or al- 
ternatively as we show in §[4] that it is possible to reconcile 
the BME with the CHE, we can use the conventional sound 
speed formula ((8]) in a directionally dependant manner. This 
makes use of a value for 7 = 3 and the dispersion tensor Ujk 
which can be treated as a directionally dependant pressure 



jk 



(8) 



The scheme can easily be extended to arbitrary orders 
of te mporal and spatial accuracy (see IKurganov fc Levvl 
|2000l ). however we choose to implement a second order 
accurate Runge-Kutta solver, providing a good compro- 
mise between accuracy and computational cost. Higher or- 
der schemes would require additional guardcell updates at 
the many midpoints, increasing communication overhead, 
whereas the second order accurate scheme can perform the 
update using just the information in locally stored guard- 
cells. 

The only modiflcation which needs to be made in order 
for the scheme to function well with the adaptive mesh, is 
that flux conservation be checked at refinement boundaries. 
Fluxes calculated for higher resolution cells are considered 
to be more accurate than those at lower levels of refinement. 
Therefore the mean flux from the four higher resolution cells 
on a reflnement boundary is calculated and passed to the 
neighbouring block at the coarser level of reflnement to be 
used in place of the locally calculated flux. 
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3.3 Accurate Solution of the Riemann Problem 

We implement our Riemann solver based coUisionless stel- 
lar hydrodynamics module into FLASH using the MUSCL- 
Hancock dimensionally unsplit scheme which has already 
been successfully imple mented for the C HE including 
Magneto- hydrodynamics (|Lee fc Deanell2009l '). We adopt an 
unsplit approach over a less mem ory intensive split version 
(e.g. PPM of .Colella &i Woodwar d 1984) given its superior 
ability to preserve flow symmetries. This is necessary given 
the highly non-linear nature of the BME which would oth- 
erwise quickly exaggerate any anisotropics. As an aside, we 
also note that we have significantly reduced the memory con- 
sumption of the default MUSCL-Hancock scheme provided 
with FLASH through more efficient memory management, 
ensuring that the increased memory overhead of using the 
unsplit scheme is equivalent to only a single extra AMR 
block. The results for the reduced memory version are of 
course bit- wise identical to the default release. 

Our coUisionless scheme differs from the standard hy- 
drodynamics scheme primarily through our determination 
of the new eigenvalues and eigenvectors which are required 
for the characteristic tracing step. This allows for the calcu- 
lation of the time averaged fluid properties which can then 
be used to calculate the second order accurate fluxes using a 
conventional Riemann solver such as the approximate HLL 
famil y of schemes inc luding the HLLC solver (s ee, e.g. iTord 

or more ac- 



1 19991 : iToro. Spruce fc Speares 
curate Roe scheme l|Roel 1981 



igg^lQuirkHigg^ ) 



We begin by expressing the zeroth, first and second or- 
der moments of the Boltzmann equation (see equations [51 El 
and[4| in compact matrix notation form (neglecting gravity 
for the moment). 



equations for the primitive quantities; density, bulk velocity 
and velocity dispersion: 



W : 



G(W) 



pVy 

VxVy 

Vy'"y + o-yi 

VzVy 

P<^lx'"y 
P<^ly'"y ■ 
P<ylzVy 



F(W) 



VxVx + ' 

VyVx 

VzVx 
P<^xx-"x 

Pf^^y^x 

Vx 



t2 

XX 



-IP'^x 



yy"y 



H(W) : 



pa 

pVz 

VxVz 

1>y1>z 

P'^lx^z 

pa^yVz 

L P'^Z 



■ Vz 



y^p^^z 
(11) 

where 7 = 3 (see §[4|. 

For brevity we now consider just the a;-directional fiux 
vectors in the conserved quantities, F(U) and primitive vari- 
ables, F(W). As the components of the primitive fiux vec- 
tors /i are each functions of the individual primitive quan- 
tities Mi, the Jacobian of the system in the x-direction, 
A(W)a;, can be expressed in terms of the partial derivatives 
of F(W) with respect to W: 



A(w) = af/aw = 



dfn/dwi 



dfi/dwn 



Ut+F(U), + G(U), +H(U)z 







(9) 



(12) 

The eigenvalues \i of the system are then obtained by 
solving for the roots of the polynomial. 



with the conserved advection quantities U and their corre- 
sponding fluxes F(U), G(U) and H(U) in the respective x, 
y and z-directions being. 



U : 



G(U) 





P 






PVx 






pVy 






pVz 


) 




Exx 






^yy 






. E,, _ 




pVy 


pv 


xVy 


pVyVy + pa^y 


pv 


zVy 


Exx'^y 


EyyVy + 2pa^y 


EzzVy 



F(U) 



2pa: 



Vx 



H(U) 



PVx 

pVxVx+ pal^ 

pVyVx 

pVzVx 
Exx^ 

^y. 

EzzVx 
pVz 
pVxVz 

pVyVz 

pVzVz - 
ExxVz 



2 I 

XX *- 



^yy 



Vz 



■ '2p<^lz'"z 

(10) 

where Ejk = pv^^, + pa^}^. For simplicity, in this paper we 
assume a diagonalised velocity dispersion tensor in the above 
flux vectors, neglecting off diagonal terms, ie. ajk = V j 7^ 
k. As the dimensions of the matrix and therefore complexity 
of the problem grow quickly with the increasing order of 
moments that we consider, we leave the tracking of the off 
diagonal terms and higher order moments to our next paper. 

By taking the conservative equations and expanding 
the derivatives, it is possible to express equivalent advection 



All = , 



(13) 



where I is the identity matrix. Physically, the eigenvalues 
represent the speeds of signal propagation within the system 
and are thus important for determining the extent to which 
ffuid bounding a cell interface affects the flux through it over 
the time step. Unlike a coUisionally dominated fluid which 
has flve eigenvalues, the (diagonalised) coUisionless fluid has 
seven to account for the anisotropy in the dispersion tensor 
(For a non-diagonalised version, this increases to ten). We 
determine the eigenvalues for the a:-direction to be 



A : 



f^xx , ^x , Vx , , ~r CLxx , ^x , Vx\ 



(14) 



with Uxx being the sound speed in the a;-direction. The di- 
rectionally dependant sound speed, Sjfc, is in close analogy 
to that of a standard coUisionally dominated gas. 



70" 



(15) 



The right eigenvectors of the Jacobian, R,''' , are those 
that obey the relation AR''> = AiR.''^ whilst the left 
eigenvectors of the Jacobian, L'*', are those which satisfy 
L'^'A — AiL^''. Using the Jacobian for the primitive vari- 
able matrix equations. 
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A(W), = 
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Vx 



(16) 



and the equivalent eigenvalues (|14p , the eigenvector matrices 
are found to be: 



p 








1 








p 



















(^xx 








-1 
p 


























j_ 

p 








2 

pa-xx 

















palx 





-1 

p 
































1_ 






(17) 



-1 








1 




















~p 








~p 























-1 














p 


























p 


1 

2a j; a: 








1 

ap^L 









(18) 



With the general form of the eigenvalues and corre- 
sponding eigenvectors calculated, we can implement them 
into the MUSCL-Hancock scheme in-line with the approach 
taken bv lStone et al.l (|2008l ) which we outline below. First we 
compute the left, right and central differences of the primi- 
tive variables using the cell centred values within each cell, 
where i is the cell index. 



5wL,t = Wi - Wi_i , 

5wc,, = (wi+1 - w,_i)/2 , 
S-WR^i = Wi+i — Wi 



(19) 
(20) 
(21) 



We then map them onto the characteristic variables using 
the left eigenvectors calculated using (jlSp . where L(w)i is 
the matrix corresponding to the i*'' row of the left eigenvec- 
tor matrix, and the primitive variables Wi are defined at the 
cell centres, 



SuL.i = L(Wi) ■ S-WL,i , 

Suc.i = L(wi) • 5-wc,i , 



(22) 
(23) 
(24) 



As the scheme is higher than first order, the reconstruc- 
tion of the time averaged variables at the cell interfaces, can 
lead to the formation of new non-physical local maxima or 
decreasing minima. To prevent their formation which oth- 
erwise leads to spurious post shock oscillations, we apply 
monotonicity constraints to the characteristic differences, 
ensuring that the time aver aged variables are total varia- 
tion diminishing (TVD, see lLeVeaue|[200^ 'l. 

(Ju™ = sgn((5uc,0 inin(2|(5uL,i!, |(5uc,i|, 2|(5uH,i|) . (25) 

The monotonised differences are then projected back onto 



the primitive variables, where R(wi) is the matrix corre- 
sponding to the i"* column of the right eigenvector matrix 



dw, = ou. 



R(i 



(26) 



We now calculate the values of the primitives at the left, 
i — 1/2, and right, i + 1/2, cell interfaces using the primitive 
variable monotonised differences. 



WL,i+l/2 = Wi - 
Wfl,i_l/2 = W, - 



--max{X, ,0) — 

maxyXj , u)— ^ 

2 ^ ' '2Sx 



r n 
OWi 



(27) 
(28) 



where Af^ and are the largest and smallest eigenvalues 
based upon the cell centred data. 

We can now perform the characteristic tracing step, re- 
moving from the quantities calculated in equations (|27|) and 
(|28[1 . contributions from each set of waves that travelled 
towards the interface but did not reach it during the for- 
ward half-tim e-step 5t/2 using (|Colella fc Woodward! [1983 : 
IColellal[l990h . 



WL,i+i/2 = Wi,i+i/2 + ^ ^ [(Af -- A°)L°' ■ 5wr] R." , (29) 



Wfl,,_i/2 = Wfl,»-i/2 + ^ ^ [(A? - A")L° ■ (Jw™] R? . (30) 

With time averaged primitive values calculated for each 
cell face, Godunov's fluxes can now be computed and ap- 
plied using a conventional Riemann solver technique such as 
the accurate Roe scheme. If using an approximate Riemann 
solver such as the HLL scheme, then due to the averaging 
of the intermediate eigenstates between the fastest left and 
right waves, it is necessary to add the additional terms H31|) 
and (|32|) shown below, to equations (|29|) and (|30|l respec- 
tively. 



WL,i+i/2 = ^ [(A" - Af )L" • 5wrj R" , (31) 

A°<0 

Wb,.-i/2 = J2 [(^? - ■ ^^^] • (32) 



A<»>0 

These remove the effect of waves which propagate away 
from the cell interface which would otherwise be included, 
allowing for the scheme to achieve second order accuracy. 

Through the use of operator splitting, we can deal with 
the addition of extra source and sink terms, including grav- 
ity, separately after the coUisionless stellar hydrodynamics 
step. To ensure that the gravitational acceleration term is 
kept second order accurate in time, we store the acceler- 
ation from the previous time step and interpolate forward 
to account for temporal variation in the gravitational field. 
Additional source and sink terms can include star forma- 
tion and supernova, although we only consider gravity in 
this paper. 
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4 TESTS 

Notwithstanding a formal similarity of the CHE and BME, 
testing the latter is nontrivial as it requires knowing analyt- 
ical solutions for the dynamics of collisionless systems. For- 
tunately, some of the test problems that are used to bench- 
mark the performance of hydro codes can be adapted to test 
the collisionless systems. This is because equations (l2])-(|4]) 
can be made formally identical to those of classical hydrody- 
namics in special cases. Indeed, for a one-dimensional flow 
of a non-gravitating collisionless fluid along the a;-direction, 
the BME become 
dp 
dt 



dt 



d_ 

dx 
d_ 

dx 







(33) 
(34) 
(35) 



If we now assume that pcr^j. is equal to the gas pressure 
P in classic hydrodynamics and e = \pa^^ then as P = 
(7 — l)e = 2e, i.e. the ratio of specific heats 7 is equal to 3, 
then equations ((33])-((35| turn into the following form 



dp 
dt 



^(P-X) 



+ 



^(p-x) = 0, 

^ [pu^ ■ Itx) = 

dx 



dP 

dx 



d_ 

dx 



e + \pul + P 



= 



(36) 
(37) 
(38) 



which are exactly one-dimensional classic hydrodynamics 
equations. The fact that the BME for a one-dimensional 
flow turns into the CHE for 7 = 3 is not a coincidence but is 
related to the fact that a one-dimensional flow of a collision- 
less fluid has only one translational degree of freedom i = 1, 
unlike a collisional fluid which can still be characterised by 
1 = 3 due to frequent collisions that equipartition the kinetic 
energy of particles between the three translational degrees of 
freedom. This means that in order to properly describe the 
one-dimensional flow of a collisionless fluid using the clas- 
sical hydrodynamics approach, one needs to assume that 
the fluid has only one translational degree of freedom, i.e. 
7 = (i -I- 2)/i = 3. It is straightforward to show that for 
a two-dimensional flow of a collisionless fluid, one needs to 
consider a specific case with pcr^^ = P and p<Tyy ~ P and 
also 7 = 2, i.e. a flow of a collisional fluid with two degrees 
of freedom. 




0.0 0.2 0.4 0.6 

Distance 



Figure 2. Comparison of the analytic (solid lines) and numerical 
(open circles) solutions of the one-dimensional Sod shock tube 
problem. The latter is obtained using the Riemann solver. See 
the text for more details. 
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4.1 Sod shock tube 

The above analysis allows us to use the standard Sod 
shock tube problem, often employed to benchmark colli- 
sional hydro codes. To generate an analytic solution to a 
one-dimensional discontinuous state along the x-coordinate, 
we choose 7 = 3 and use the standard solution procedure 
l|Sodlll978l ). For a two-dimensional discontinuous state in the 
X — y plane at an arbitrary angle a to the a;-direction, an 
analytic solution is obtained by choosing 7 = 2. 

In the ID case, we run the test along the x-coordinate 
and set pa^^ and p at x G [0 — 0.5] to 1.0, while at x £ 
[0.5 — 1.0] the x-component of the stress tensor is set to 0.1 
and the stellar density is 0.125. The numerical resolution 
is 200 grid zones and the cells are equidistantly spaced. In 



Figure 3. The same as Fig. [2] only for the two-dimensional Sod 
shock tube problem. 



the 2D case, we align the initial fluid discontinuity along 
the [1,1,0] plane and use the same densities and velocity 
dispersions as were set for the ID setup setting pa^^ = po-yy 
in order to yield agreement with the results for an isotropic 
collisionally dominated gas. This yields a shock wave which 
propagates at 45° to the x and y axes. Again we use 200 
cells in the x and y directions and note that the setup can 
easily be extended to three dimensions although for brevity 
we do not show these here. 

Figures [2] and [3] show the results of our tests for the ID 
and 2D setups, respectively, at t=0.1 using the HLLC Rie- 
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ID test problem (KT scheme) 
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Figure 5. Comparison of tlie obtained numerical solution (dia- 
monds) to the analytic solution (solid line) for the collapse of a 
homogeneous pressureless sphere after 0.985 free fall timescales. 



Figure 4. The same as Fig. [2] only for the KT scheme. 



mann solver. In particular, the top-left, top-right, bottom- 
left, and bottom-right images show the density, velocity, the 
x-component of the stress tensor pa^j,, and cr^xKl — !)• The 
latter two quantities may be regarded as analogues to the 
gas pressure P and internal energy density e in classical 
hydrodynamics. In the ID case, the plotted velocity is u^, 
whereas in the 2D case we plot the absolute value of veloc- 
ity (u^ + Uy) along the diagonal x = y, i.e., along the 
direction of the shock wave propagation. It is evident that 
the Riemann-solver-based numerical scheme can trace the 
shock waves (propagating to the right) very well, usually 
within one to two grid zones. Contact discontinuities (prop- 
agating to the left) are smeared to a greater extent, usually 
by 5-6 grid zones. Nevertheless, the position of the shock 
waves and contact discontinuities along with the values of 
the flow variables are reproduced quite accurately. 

In a conventional coUisional shock, the energy of the 
macroscopic motion of the gas is transfered through col- 
lisions into random isotropic kinetic motions of the parti- 
cles (thermal energy). In a coUisionless fluid, the energy of 
macroscopic motion is transfered into an effective increase 
in the velocity dispersion as stellar streams with different 
velocities mix at the shock front. 

Open circles in Figure |4] present the results of the ID 
Sod shock tube test using the KT scheme. It is evident that 
this scheme does not provide as good an agreement with 
the analytic solution as the HLLC Riemann solver, partic- 
ularly for the velocity dispersion (bottom-right panel). In 
the KT scheme, the shock fronts and contact discontinuities 
are smeared out over roughly twice as many grid zones as 
in the HLLC scheme. However encouragingly, the positions 
of the shocks and discontinuities and also the values of the 
flow variables are still accurately reproduced. All in all, the 
HLLC scheme shows an undoubtedly superior performance 
though at a higher CPU cost, with the KT scheme typically 
taking around 22% less time to run and a near identical 
amount of memory. 



4.2 Collapse of a pressure-free sphere 

The gravitational collapse of a pressure-free sphere is used 
to assess the code's ability to accurately treat converging 
spherical flows on the Cartesian grid. This test is also useful 
for estimating the performance of the gravitational poten- 
tial solver on dynamical problems for which one has to use a 
finite-difference form of the gravitational potential gradient. 
Since the test setup involves no stress tensor, the result- 
ing equations are identical in the coUisionless and coUisional 
fluid dynamics cases and one can use an analytic solution 
describing the collapse of every mas s shell in the limit of an 
infinite sphere radius ()Hunteilll962[ ). 

To run this test, we set a cold homogeneous sphere of 
unit radius and density (for convenience, the gravitational 
constant is also set to unity) and let it collapse under its 
own gravity. We use a block size of 16^ cells and six levels of 
refinement leading to an effective grid size of 512^ cells. We 
adopt isolated gravitational boundary conditions. Unfortu- 
nately, we must consider a cloud of finite radius in Cartesian 
geometry with a sharp outer boundary to preserve the cloud 
sphericity. As a result, a rarefaction wave develops after the 
onset of the collapse, propagating towards the coordinate 
origin and necessitating complicated corrections to the ana- 
lytic solution. 

Figure \5\ compares the results of our numeric al simu- 
lation with the "uncorrected" analytic solution of iHuntetj 
([l96^ (solid line) at 0.985 free fall timescales, when the 
initial density has increased by nearly three orders of mag- 
nitude. The majority of cells within the sphere lie upon the 
analytic solution indicating a good homologous collapse with 
only a small peak above the analytic solution forming within 
the very central few cells. In addition to this small peak, the 
initially sharp boundary of the cloud is smeared out over 
several cells as a result of the rarefaction developing. The 
analytic radii of the cloud however shows reasonable agree- 
ment with the radii at which the half peak density is reached. 
In general, our results are found to be in strong agreement 
with those of other authors ap plying this test problem to 
the classic hydrodynamics (e.g.. lStone fc Norman|[l992l '). 
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sotropic P Anisotropic P 




Figure 6. Plots of the O.^pmax contour every 0.1 free fall 
timescales for collapsing spheres of initially homologous, spher- 
ical mass distributions. The left hand panel has an isotropic pres- 
sure tensor and thus maintains expected sphericity as it collapses, 
whilst the sphere in the right hand panel deforms over time due 
to the initial asymmetry in the dispersion tensor, yielding a pro- 
nounced ellipse. 
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Figure 7. EUipticity of the collapsing cloud with an anisotropic 
dispersion tensor, as a function of time. The ellipticity, the ratio 
between the minor and major axes, is determined for different 
iso-density surfaces; O.dpmax (solid line) and 0.5pmax (dashed 
line). The major and minor axes begin initially as the z and x 
axes respectively, but switch after material first reaches maximum 
compression within the core and rebounds at t 0.9tff, with 
material being preferentially expelled along the x-axis. 



4.3 Collapse of a sphere with anisotropic stress 
tensor 

As a third test, to demonstrate that the code can model the 
formation of anisotropies, we initialise two copies of the same 
sphere as in 14. 21 with a non-negligible dispersion tensor. In 
one we initially set an isotropic dispersion tensor whilst in 
the second we initially weight the different diagonal terms 
f^xx '■ ^yy ■ f^zz with the ratio of 1 : 2 : 3. We set the system 
up so that the initial maximum effective dispersion energy is 
only 25% of that needed to support the sphere against grav- 
itational collapse. This allows the sphere to collapse down 
from its initially symmetric configuration and deform into 
an ellipsoid. The velocity dispersion is then free to change 
as the simulation evolves. 

We plot in figure[6]the density contour for p — 0.9pmax 
at regular time intervals for both the cloud with an isotropic 
dispersion tensor (left hand panel) and an anisotropic dis- 
persion tensor (right hand panel). The isotropic dispersion 
tensor leads to a collapse which is isotropic as expected 
whilst in the anisotropic case, the collapse proceeds quickest 
in the direction with the lowest dispersion term. This leads 
to a flattening of the sphere into an ellipse. 

In figure [7] we plot the ellipticity of the collapsing 
anisotropic cloud as a function of time. The ellipticity is de- 
termined to be the ratio between the minor and major axes 
of the ellipsoid over a specified iso-density surface. The solid 
line corresponds to the ellipticity of the iso-density surface 
of O.Qpmax and the dashed line is the ellipticity for 0.5pmax- 

Initially, since the velocity dispersion is lowest in the 
x-direction, material along that axes collapses fastest, with 
the ellipticity increasing as the density builds. This leads to 
a greater effective pressure, Pxx = pc^x^ from the velocity 
dispersion. Eventually the core collapses to a density and 
pressure sufficient to stop further collapse. At this point the 
sphere begins to isotropise as material from the other two ax- 
ial directions catches up and falls onto the core. This collapse 
of material onto the core, drives up the pressure and material 



is forced out around t ~ 0.9t// along the x-direction where 
least resistance is offered. This again increases the elliptic- 
ity and results in the major and minor axes of the ellipse 
switching places as the x-axis becomes the major instead 
of the minor axis and vice versa for the z-axis. The sys- 
tem then oscillates between a high degree of ellipticity and 
near spherical symmetry as the different axes of the ellipsoid 
collapse and rebound at different times and speeds. These 
oscillations repeat over many free fall timescales. Material at 
a lower density follows the same cyclical pattern of varying 
ellipticity but with less pronounced maxima and minima. 
As a result of the non-linear nature of the collapse and its 
relative separation from the violent oscillations within the 
core, transitions at lower density are also smoother. 

4.4 Spiral instability in a stellar disc 

The final test problem describes the growth of a spiral struc- 
ture in a gravitationally unstable stellar disc. The stability 
analysis of a thin stellar disc to a local axisymmetric per- 
turbation stat es that the disc is gravitationally unstable if 
(jToomrdll96i ) 



Qt 



< 



1.0, 



(39) 



3.36GE 

where is the radial stellar velocity dispersion, k is the 
epicycle frequency, and E is the stellar surface density. A fi- 
nite disc thickness and non-axisymmetric perturbations may 
increase the critical Toomre parameter Qc by a factor of 
unity. 

The initial setup consists of a self-gravitating, rotating 
stellar disc submerged in a fixed dark matter (DM) halo. The 
latter is described by the quasi-isothermal density profile of 
the form 

Po 



where zu is the galactocentric distance. The central density 
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Figure 8. Initial radial distributions of the rotation velocity 
(solid line), stellar surface density (dashed line) and Toomre Q- 
parameter (dash-dotted line) of the model disc. 



po ~ 1.24 X 10~^ Mq pc""' and the characteristic scale length 
of the quasi-isothermal halo ro = 3.2 kpc are calculated 
based on the assumed DM halo mass Mdm = 2 X 10" M© 
and halo radius , rh = 124.7 kpc, using relations provided in 
IVorobvov et~aLl l|2012l ). 

The computational box spans 30 kpc in each coordinate 
direction {x,y,z). The numerical procedure for generating a 
rotationally supported stellar disc in the combined gravita- 
ti onal potential of stars and DM halo is described in detail 
in lVorobvov et aTl (|2012h . The rotation curve, the radial gas 
surface density profile and the radial profile of the Toomre 
parameter Qt of the initial stellar disc are plotted in Fig. [H] 
with the solid, dashed, and dot-dashed lines, respectively. 
Evidently, the disc is initially gravitationally unstable. 

Figure |§] presents a series of disc images at several suc- 
cessive times since the beginning of numerical simulations. 
Usually, an initial seed perturbation is required to drive the 
system out of equilibrium and initiate the growth of a spiral 
structure. However, in our case, the initial perturbation is 
introduced due to a re-map of the initial e quilibrium con- 
figur ation from the cylindrical coordinates (jVorobvov et al.l 
1201^ onto the Cartesian coordinates (the FLASH code). 
The growth of a two-armed spiral mode is apparent in the 
figure. 

To quantify the growth rate of spiral modes in our 
model, we employ global Fourier amplitudes defined as 

+X +Y 



Ti{x, y, t)e' dxdy 



(41) 



where m is the spiral mode, Ti{x, y, t) is the surface den- 
sity of the stellar disc, = arctan(j//x) is the polar angle, 
and A'/d is the total mass of the stellar disc. The amplitudes 
are calculated on a square region with size {—X : X, —Y : 
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Figure 9. Disc images showing the growth a a two-armed spiral 
pattern over time. For each snapshot, the difference between the 
initial azimuthally symmetric disk surface density profile and the 
current profile is shown. The colour bar is in units of Mqpc~^ and 
the time is shown in the top right of each figure, corresponding 
to 190, 290, 335, 360, 380 and 400 Myrs. 
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Figure 10. Global Fourier amplitudes as function of time elapsed 
since the beginning of numerical simulation. 



Y) = (—2 : 2, —2 : 2) kpc, which is centred on the coordi- 
nate origin and encompasses the growing spiral. The global 
Fourier amplitudes can be regarded as the amplitude of a 
spiral density perturbation relative to the surface density of 
the axisymmetric disc. 

Figure [TO] shows the temporal evolution of the first six 
global Fourier amplitudes (in log units) in our model stellar 
disc. Initially all amplitudes are negligibly small except for 
that corresponding to the m = 4 mode. This is a numeri- 
cal artifact and arises due to t he finite resol ution within the 
core. Similar to the results of iTasker et 3 (2008) in which 
they model a static King-profile that is initially in hydro- 
static equilibrium, the finite resolution within the core leads 
to an under-resolved gravitational potential. This results in 
a slight relaxation and expansion of the very central region. 
An outward flow of material results through the faces of 
the central most cell and due to the fact that our disk is 
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constructed so as to promote the growth of gravitational in- 
stabilities, this generates a smaU yet non-neghgible pertur- 
bation. Although similar behaviour may occur in all mesh 
codes, those using Cartesian geometries are particularly sus- 
ceptible given their strong natural bias for flows along the 
axes of the mesh. Despite being present from the very onset 
of the simulation, the m = 4 mode slowly decays and does 
not lead to the growth of a spurious four armed spiral given 
the dominance of the m = 2 mode at late times by around 
two orders of magnitude. 

The m = 2 mode demonstrates the fastest growth rate. 
It saturates at around t « 370 Myr and declines slowly in 
the subsequent evolution due to continuing disc heating. The 
maximum amplitude of the m — 2 mode is about 10~^'^^, 
which suggests a non-axisymmetric density perturbation of 
about 2% relative to the underlying axisymmetric disc. 

An appealing physical interpretation for the growth of a 
spiral structure in our model disc is swing amplification. Am- 
plification occurs when any leading spiral disturbance, e.g., 
introduced by the initial seed perturbation, unwinds into a 
trailing one due to differential rotation (e.g. iToomrd Il98ll : 
lAthanassoulalll984l : IVorobvov fc Theilbood l2008O n addi- 
tion, swing amplification needs a feedback mechanism that 
can constantly feed a disc with leading spiral disturbances. 
Trailing short- wavelength disturbances propagating through 
the disc centre and emerging on the other side as leading 
ones provide a feedback for the swing amplifier l|Toomrel 

We can check if the growth mechanism of our spiral 
structure is consistent with the predictions of the swing am- 
plification theory. For our spiral pattern to be triggered by 
the swing amplifier, there should be no inner Lindblad reso- 
nance (ILR) for the m — 2 mode. Otherwise, the correspond- 
ing trailing disturbances will damp at the ILR, thus failing 
to pass through the disc centre and promoting the growth 
of the m = 2 mode. Figure [TT] shows the radial profiles of 
n and Q ± n/m a.t t — 330 Myr, where Q, is the angular 
velocity of the stellar disc and n is the epicycle frequency. 
Both quantities are the azimuth averages. In particular, the 
solid and dashed lines show the corresponding values for the 
m — 2 and m — 3 modes, respectively. The dash-dot-dot- 
dotted and dash-dotted lines show the angular velocity Qp 
of the global spiral pattern measured at two distinct posi- 
tions in the disc: 2.0 kpc and 4.0 kpc. In theory, Qp should 
be independent of radius, though in practice it is always 
slightly faster in the inner regions than in the outer ones due 
to gradual winding of a spiral pattern. The radial position 
of Lindblad resonances are determined as m(f2 — fip) — ±k, 
i.e., as the radial distance where the forcing frequency of the 
spiral pattern coincides with the epicycle frequency of the 
stars. In particular, the inner and outer Lindblad resonances 
correspond to the plus and minus signs, respectively. 

Figure [TT1 demonstrates that there is clearly no ILR for 
the m — 2 mode, whereas there is one marginally for the 
m = 3 mode (and thus for any higher-order mode). Indeed, 
the dash-dotted and dash-dot-dot-dotted lines never inter- 
sect the fl — K/2 curve, whereas they do intersect the SI — k/3 
curve at r < 1 kpc, thus damping m = 3 disturbances that 
try to pass through the disc center. Although slight growth 
over time of the m — 3 mode is observed in figure 1101 it 
is damped relative to the m — 2 mode by over two orders 
of magnitude. Higher order modes for which a clearer ILR 
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Figure 11. Radial behaviour of Q, il±K/m,, and f2p in our model 
stellar disc, where Q is the azimuthally averaged angular velocity 
of stars and f2p is the angular speed of the spiral pattern. The 
dash-dot-dotted and dash-dotted lines show Qp as measured at 
two distinct positions in the disc; 2.0 kpc and 4.0 kpc respectively. 
The intersections of O ± re/m and Qp determine the positions of 
Lindblad resonances for the corresponding spiral mode m. 

will be present are damped to a much greater extent. This 
explains why the m = 2 mode (and not any higher-order 
mode) ultimately dominates the spiral pattern. 

Another consistency check on our model is the position 
of the outer Lindblad resonance (OLR). Spiral disturbances 
cannot propagate through the OLR, which effectively limits 
the radial extent of a spiral pattern. Figure [Tl]indicates that 
the OLR for the m = 2 mode is located approximately at 
4 kpc and this value is in agreement with the radial extent of 
the spiral pattern in Fig. |9] We conclude that our numerical 
results are in general agreement with the predictions of the 
swing amplification theory. 



5 DISCUSSION AND SUMMARY 

In this paper we have outlined a series of important lim- 
itations that are encountered in mesh codes when using 
the particle-mesh technique. Used to consistently integrate 
coUisionless stellar or dark matter components into simula- 
tions which model gas physics on an Eulerian mesh, the 
particle-mesh technique has the advantage of being rela- 
tively straight forward to implement but is unfortunately 
subject to several key limitations. From a technical perspec- 
tive these include poor load balancing and increased commu- 
nication overhead. From a physical point of view, discrete- 
ness effects incurred when discrete particle properties are 
mapped to the mesh, are believed to result in spurious en- 
tropy generation. In galaxy formation simulations, this can 
have a significant influence on the reservoir of cold gas avail- 
able to form stars, potentially changing the entire dynamics 
of the system. 

To overcome these limitations we have proposed the use 
of the coUisionless Boltzmann moment equations as a pow- 
erful alternative. Such an approach allows us to model the 
collective properties of coUisionless objects such as stars as 
a fluid, instead of relying upon the traditional N-body ap- 
proach. In this paper we refer to this approach as "coUi- 
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sionless stellar hydrodynamics" given its primary use in the 
modelling of the properties of large collections of stars, such 
as those in a galaxy, in much the same way as we model 
gases in Eulerian grid codes. Although direct integration of 
the coUisionless Boltzmann equation has been undertaken 
by other authors, it is often prohibitively expensive to per- 
form. This arises as it is necessary to track both the spatial 
dom ain and its corre s pondi ng velocity phase space. How- 
ever lYoshikawa et al.1 (|2012l ) demonstrates that the ability 
to resolve small scale velocity fluctuations is much greater 
in these codes than for conventional N-body simulations. 

Instead of a costly direct integration of the Boltzmann 
equation, we opt to derive the zeroth, first and second order 
moments under the usual zero heat flux approximation and 
taking the third order moments to be negligible. Although 
ultimately an approximation, we show that this yields re- 
sults in-line with a series of demanding test problems, re- 
quiring a factor of 32'^ less memory than is needed using 
direct integration over the complete velocity phase space. 

We implement these new coUisionless stellar hydro- 
dynamic equations into the massively parallel FLASH 
AMR code using two different numerical solvers for hy- 
perbolic partial differential equations. The first scheme by 
iKurganov fc Tadmorl (|200di), requires no information on the 
equation of state of a coUisionless fluid or any assumptions 
about the behaviour of fluid interactions within the Rie- 
mann fan. This Riemann solver free prescription allows us 
to track sharp discontinuities with relatively low numerical 
diffusion and provides a very useful comparison with which 
to compare our more sophisticated second scheme. In future, 
its extensibility is hoped to be used to explore the impact of 
including the higher order moments which we exclude here. 

By assuming a diagonalised dispersion tensor, we have 
derived the eigenvalues and eigenvectors for the first three 
moments of the coUisionless Boltzmann equations. These 
have been implemented into a characteristic tracing method 
based on the MUSCL-Hancock unsplit advection scheme. 
This allows time averaged primitive states to be calculated 
at the cell interfaces using characteristic tracing, from which 
time averaged fluxes can be computed using a range of Rie- 
mann solvers. 

We validate our code using a suite of tests with analytic 
solutions, including some commonly used to benchmark con- 
ventional hydrodynamic codes. By realising that the classic 
hydrodynamic equations and Boltzmann moment equations 
can be reconciled under certain conditions, we are able to 
generate analytic solutions for the Sod shock test which are 
applicable for coUisionless systems. Of particular note, we 
find that since a coUisionless fiuid has only one degree of 
freedom in a given direction and is unable to equipartition 
the kinetic energy of particles between the three dimensions 
through collisions, it has an effective ratio of specific heats, 
7 = 3. Thus in ID, a Sod shock solution for a collisionally 
dominated gas with 7 = 3, matches that for a coUision- 
less fluid. Using this insight, we confirm that our numeri- 
cal schemes; both the KT scheme and characteristic tracing 
method, accurately reproduce analytic solutions. Although 
the KT scheme runs 22% faster with near identical mem- 
ory consumption, our Riemann solver based scheme shows 
notably better resolving power of sharp shocks and discon- 
tinuities, spreading them over roughly half the number of 
cells. 



Through the use of a spherical pressure-free collapse 
problem, we confirm that the code can maintain good 
fiow symmetry in convergent fiows in the absence of any 
anisotropy in the dispersion tensor. Agreement with the 
peak density within the collapsed sphere also confirms the 
validity of the Poisson solver used in FLASH. Through the 
inclusion of a non- negligible anisotropic dispersion tensor, 
we extend the spherical collapse to observe the behaviour 
of an initially homologous spherical cloud as it collapses 
and deforms into an elliptical profile. Although initial be- 
haviour of the system occurs in-line with expectations, the 
long term behaviour is more complicated with repeated tran- 
sitions between an elliptical and spherical profile, along with 
switching of the major and minor axes after the initial core 
implosion. Although the inclusion of gas physics may help 
to isotropise the internal structure of the cloud and damp 
these oscillations, they will nevertheless remain important 
to the dynamics outside of the core. We will explore in more 
detail interactions between the collisionally dominated gas 
physics and the stellar material in future papers. In particu- 
lar, we will extend our scheme to include off-diagonal terms 
and a more thorough exploration of the effects of higher 
order terms. Although powerful in its current form, the ad- 
dition of off-diagonal terms in the velocity dispersion tensor 
will allow us to directly measure the vertex deviation, a key 
observable used for probing galactic structure. To date ob- 
servers have made extensive measurements of our own Milky 
Way's vertex deviations which can hide detailed information 
about the kinematic properties of the bar and spiral arms. 
However noise in particle based simulations has made ac- 
curate theoretical predictions difficult, something which our 
new coUisionless stellar hydrodynamics code can allow us to 
overcome. Having already confirmed the applicability of the 
coUisionless stellar hydrodynamics code to the formation of 
spiral structure, we have found excellent agreement between 
our numerical simulations of the growing spiral pattern and 
the predictions of the swing amplification theory, with the 
correct relation between the disc size and the outer Lindblad 
resonances being observed. 

As a final note, we find the time taken in FLASH for 
the Riemann solver based coUisionless stellar hydrodynamics 
routine is only 7% greater than that for the original classic 
hydrodynamic scheme, allowing the coUisionless stellar hy- 
drodynamics to scale linearly with the classic hydrodynam- 
ics. We also find that instead of doubling the overall amount 
of time spent communicating when both gas and coUisionless 
stellar hydrodynamics are included, our grouping of commu- 
nications results in a net increase of only 1.2% to the overall 
communication time compared to that of just a single phase 
hydrodynamic scheme. Thus we conclude by confirming that 
our new coUisionless stellar hydrodynamic approach using 
the Boltzmann moment equations both preserves the excel- 
lent scaling previously demonstrated in FLASH as well as 
enhancing the level of detail we can expect to be able to ex- 
tract from our results, with none of the discreteness effects 
observed for particle-mesh techniques. 
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APPENDIX A: CONVOLVING THE 
BOLTZMANN MOMENT EQUATIONS TO THE 
CLASSIC HYDRODYNAMICS FORM 

On general grounds one can argue that the Boltzmann mo- 
ment equations ((H-Q should convolve to the classic hydro- 
dynamics equations (ClfE) if the stress tensor Hij is diag- 
onal and isotropic. In this section we demonstrate this by 
assuming that Uij = for i / j and IIxx ~ Hyy = fizz, 
implying that ctxx = cTyy = Czz = cr. The equation of conti- 
nuity ((2} is identical to that of the CHE and the momenta 
equations (jS} trivially turn into their coUisional counterparts 
with a substitution IIxx = po"xx = pcr^ ~ P (and identical 
relations for other components), where P is the gas pressure. 

It takes a bit more math to show that the dispersion 
equations Q can be convolved to the equation for the total 
energy per unit volume Et: — e + p\u\^ jl. Let us expand 
equation @ into the component form in Cartesian coordi- 



nates. 






aSxx 




d 


dt 


dxk 


dEyy 




d 


dt 


dxk 


aszz 


+ 


d 


dt 


dxk 



EzzUk + 2p(JzfcUz) — ipu^a^ = . (A3) 
After summing up equations (|Aip - (|A3P we obtain 

+ 



_9 /3 

dt 



a 

dxk 
pUiOi — 



(A4) 



where we have divided the resulting equation by a factor of 
2 and have taken into account that CTxx = cryy = ctzz ~ u 
and \u\'^ — u'^ + Uy + u^. In the classical hydrodynamics the 
gas pressure relates to the internal energy density via the 
relation P — 5(7 — 1), where 7 = 5/3 for a flow with three 
translational degrees of freedom, meaning that fpf^ — e 
in the above equation. The final step is to note extract the 
diagonal part from the symmetric tensor pafj. by introducing 
the viscous stress tensor -nik. 



pafk = PSik - {P3ik - po-ffe) = PSik - TVik 



(A5) 



With this final transformation, equation HA4|) turns into the 
usual hydrodynamics equation for the total energy per unit 
volume Et of a viscous fluid. 

OEt d 

+ [itfc (-Et + P) - T^ikUt] - pUiUi = . (A6) 



APPENDIX B: ZERO-HEAT-FLUX 
APPROXIMATION 

When deriving the Boltzmann moment equations, each 
equation for the n-th moment inevitably requires the knowl- 
edge of the (n -I- 1) moment. This chain is convention- 
ally closed by setting the third-order moments to zero, i.e., 
Qijk = P~V ~ wOfe - - Uk)d^v = 0. In this 

section, we discuss the validity of this approximation. 

For this purpose it is useful to refer to a similar prob- 
lem in the coUisional hydrodynamics, for which the closure 
problem exists as well. Indeed, the CHE are derived from 
the coUisional Boltzmann equation and if the zero-heat-flux 
approximation were relaxed, then equation (|A6|I would have 
an additional term 

i^(pQ.,.) . 

A contemporary swindle is to use the empirical Fourier law 
for thermal conduction and assume that 

dT 



;PQ 



ijk 



-fc(T) 



dxk 



where T is the gas temperature and k{T) is the heat con- 
duction coefficient. Unfortunately, such a trick cannot be 
applied to a coUisionless fluid as it, strictly speaking, has no 
temperature due to the lack of local thermal equilibrium. 

However, this analysis can give us an insight as to 
when the zero-heat-flux approximation cannot be neglected. 
In the CHE, such a situation arises on a quasi-stationary 
contact discontinuity between two gases with vastly dif- 
ferent temperatures when the thermal diffusion time scale 
Tth = pL^/k{T) can be much shorter than dynamical one 
Tdyn = L/\u\, where L is the characteristic size of the sys- 
tem. In most astrophysical environments, however, the bulk 
motions of gas ensure that rdyn <S Tth and the third-order 
moments can be neglected. 

In the case of coUisionless systems, the diffusion time 
scale is played by the typical relaxation time Trd = L/a, 
which should be compared against Tdyn. It is apparent that 
the zero-heat-flux approximation is expected to be valid if 



Trcl 
Tdyn 



> 1 



(Bl) 



This condition is usually met in any r otationally supported 
system like a galactic stellar disc, with lHensler et all (|l995l ) 
highlighting stellar dynamic models which show the third 
order moments decay faster than the relaxation timescales. 
This means that the zero-heat-flux approximation is ap- 
proached faster than isotropisation. However, this condition 
may break in globular clusters that are thought to be sup- 
ported against gravitational collapse by random motions of 
stars (i.e., by high velocity dispersions). This does not inval- 
idate the whole Boltzmann moment approach approach, but 
simply means that an important piece of physics is missing 
and needs to be taken into account. We work on developing 
useful approximations that can treat such cases in collision- 
less fluids. 
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